Searching fast for a target on a DNA without falling to traps 
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Genomic expression depends critically both on the ability of regulatory proteins to locate specific 
target sites on a DNA within seconds and on the formation of long lived (many minutes) complexes 
between these proteins and the DNA. Equilibrium experiments show that indeed regulatory proteins 
bind tightly to their target site. However, they also find strong binding to other non-specific sites 
which act as traps that can dramatically increase the time needed to locate the target. This gives 
rise to a conflict between the speed and stability requirements. Here we suggest a simple mechanism 
which can resolve this long-standing paradox by allowing the target sites to be located by proteins 
within short time scales even in the presence of traps. Our theoretical analysis shows that the 
mechanism is robust in the presence of generic disorder in the DNA sequence and does not require 
a specially designed target site. 



It is commonly believed that three-dimensional diffu- 
sion is too slow for proteins to locate their specific target 
on a DNA molecule for cells to function properly. To re- 
solve this issue Berg and von Hippel suggested, in series 
of seminal papers P, Q , that combining periods of one- 
dimensional diffusion along the DNA (sliding) with peri- 
ods of three-dimensional diffusion off the DNA (jumping) 
can speed up the search time by several orders of magni- 
tude. Since then, sliding (or equivalently binding of pro- 
teins to non-specific DNA sequences) has been observed 
in many experiments [3L4L5I and is now believed to be 
a common mechanism [6 , §|, l9l [Icl [Tlj . On the other 
hand, as pointed out already in [12j . experimental and 
theoretical works have shown that the binding energies 
of a protein to different DNA sequences are very large - 
a direct consequence of the required stability of the pro- 
tein with its target site. The binding energies can be 
well fitted by a Gaussian with the strongest binding en- 
ergies of the order of ~ 30/cbT and a standard deviation 
of the order of 5&bT [l3[. This casts a cloud on the sim- 
ple facilitated diffusion picture of Berg and von Hippel 
- the binding energy distribution suggests an unaccept- 
able slow search with very slow sliding and deep traps 
[ig |. This unresolved conflict is called the speed- stability 
paradox @. 

Here, motivated by direct experimental observations 
[3 [la. f]~6ll and the theoretical work by Slutsky and 
Mirny [l0|, we consider a model in which the protein, 
when bound to the DNA, can switch between two con- 
formations separated by a free energy barrier. In one, 
termed the search state the protein is loosely bound to 
the DNA and can slide along it. In the second, recog- 
nition mode, it is trapped in a deep energy well. Note 
that equilibrium measurements of binding energies to the 
DNA are controlled by the recognition state. 

In this paper, based on a quantitative analysis of this 
model, we argue that due to the occurrence of several 
time scales in the search process the widely used defini- 
tion of the reaction rate of a single protein as the inverse 
of the average search time t ave [17] . is generally irrele- 




FIG. 1: An illustration of the model, (a) A time sequence of 
a protein sliding in the s mode (green circle), diffusing off the 
DNA (blue circle) and entering the target site in the r mode 
(red oval) . (b) A protein finding the target after entering the r 
state, (c) An illustration of the rates and the energy landscape 
which governs them at each location, i = 1, —,N, along the 
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vant as a measure of the efficiency of target location on 
DNA. When n p proteins are searching for the target, the 
relevant quantity is the probability lZ n (t) for a reaction 
to occur before time t. We show below that lZ n (t) can 
reach values close to one in a time scale t^ p (t) which can 
be orders of magnitude smaller than the value t ave /n p 
expected from the usual approach. 

Our analysis has several important merits. First, it 
reports a fast search time despite a very strong binding 
of the protein in the recognition state to any site on the 
DNA. We suggest that the measured binding energies of 
proteins to the DNA are irrelevant to the kinetics of the 
search process; the relevant quantities are transition rates 
(specified below). Second, it shows that in the realistic 
case of generic disorder in the barrier height the search 
can be very effective even if the target site is not designed. 
If experimentally verified the proposed mechanism will 
resolve the speed-stability paradox. 

The model consists of n p proteins which can each be 
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FIG. 2: A plot of TZ(t) for N = 10 6 (empty circles) and 
N — 10 s (filled squares). Lines correspond to Eq. [4] with r\, 
T2 and q derived analytically. Here X u = 10~ 2 Ao, A;, = O.fAo 
,A r = 10 _7 Ao and A s = 10 _9 Ao. These correspond to energies, 
measured relative to the s mode, of E3 — i.GksT, Etarrier = 
16.12fc.gT and E r — — 4.6fc.gT. Experiments suggest Ao — 
10 6 sec _1 for the Lac repressor [e|. 




FIG. 3: A plot of 1Z„ P (t) for n p = 1 (empty circles) and 
n p = 10 (filled squares). Here N = 10 6 , X u = KT 4 A , X b = 
O.lAo, A r = 10~ 7 Ao and A s = 10~ 9 Ao. These correspond to 
energies, measured relative to the s mode, of £3 = 9.2fc.gT, 
Ebarrier = 16.12fc.gT and E r = —^.QknT. Lines corresponds 
to Eq. [3] with calculated values of n, T2 and q. Note that 
here \ u is different from Fig. [2] 



in three states (i) an unbound state, u, in which it per- 
forms three-dimensional diffusion (jumping), (ii) a search 
state, s, where it is weakly bound to the DNA, perform- 
ing one-dimensional diffusion (sliding) and (iii) a recog- 
nition state, r, where it is tightly bound to the DNA. 
We assume, for simplicity, that in the recognition state 
the protein is trapped in a deep energy well (as justified 
by the experimentally measured strong binding energies) 
and is unable to move [l(|. The transition rates, A*, A*, 
Aft and A u , between the different states are defined in 
Fig. [T] To model sliding, in the s-state the protein can 
move with rate Xq/2 to neighboring sites on the DNA. 
Note that the rates A* and A* may depend on the loca- 
tion i = 1 . . . N along the DNA. In principle Ao and A u 
also have a dependence on i. As justified later this will 
have a weaker effect on our results and we omit it for 
clarity. Finally, after a jump we assume the protein relo- 
cates to a random position on the DNA due to its packed 
conformation (l8j . 

To gain an understanding of the difference between the 
two time scales t^ p and t ave /n p we first consider n p = 1 

in a simplified model where A* and A* are independent of 
i except at the target site T where Xj — 00 and Xj = 0. 
The disorder of the DNA sequence is neglected and the 
target is designed such that a reaction takes place at the 
first visit of the target site. As stated above, we are 
interested in the probability TZ(t) — L P(t')dt' that a 
reaction occurs before time t, where P(t) is the distribu- 
tion of the first-passage time (FPT) [la I2TI l2l| to the 
target (we drop the subscript when n p = 1). 

The Laplace transform, P{s) = e~ st P(t)dt, of P(t) 
can be obtained exactly. To do this we consider a DNA 
molecule of N sites. For simplicity we take a centered 
target site (labeled 0). Consider, first, the joint prob- 
ability density for a protein to find the target at time 
t = t s + t r starting from a location xq at t — before 



unbinding from DNA. Here t s is the total time spent in 
the s state and t r is the total time spent in the r state. 
If exactly n transitions occurred from the s-state to the 
r-state this is given by 

P n (t s ,t r \x ) = X s V(n-l, X s ,t r )V{n, A r , t s )j(t s \x Q )e- x ^ , 

(1) 

where "P(n,/i, t) — (/J.t) n e is the Poisson distribu- 

tion and we use the convention V{— l,[i,t) = 5(t)/ji. 
j(t\xo) is the FPT density at the target x = for a usual 
random walk starting from xq whose functional form was 
derived in ■ The FPT density before unbinding start- 
ing from xq then reads: 

00 />oo />oo 

J(t\x Q ) = y2 / dt s dt r S(t s + t r - t)P n (t s ,t r \x ). 

n=0 J ° J ° 

(2) 

After Laplace transform and using V(n, /1, s) = /i"/(s + 
/i) n+1 , we find J(s\xq) = j(u(s)\xo) with u(s) = 
s(s+x r +Xs+x u )+\ s x u ^ Averaging over xq and following 
[1, [H[ we finally obtain 

{ s + X b u(s) J 

where j(s) = (j(s\x)) x ~ i^I+p^ for large N 0. 

The results along with numerics, performed using a 
standard continuous time Gillespie algorithm, are shown 
in Fig. [2j As is clearly evident, for a realistic range of 
parameters (we take barrier heights to be of the same or- 
der of magnitude as the experimentally measured binding 
energies) 1Z(t) reaches a plateau close to one on a typi- 
cal time scale t typ which, for N — 10 6 , is much shorter 
than the average search time t ave = — ^ (s = 0) . Quan- 
titatively the typical search time t typ can be defined, for 



3 



o n=1 (numerics) 

0.8 — "„=' < theor y) 



□ n =10 (numerics) 



n HQQOQE]H 




FIG. 4: Plot of lZn p (t) for n p = 1 (empty circles) and n p — 
10 (filled squares) for the disordered model. The lines were 
obtained by fitting the form 1 - (qe~ t/T1 + (1 - q)) n " to the 
numerical simulations with q = 0.2817, Aqti = 1.7 ■ 10 7 and 
T2 = oo. These are close to the mean field prediction q — 
0.2827, Aon = 1.1 • 10 7 . Here A 3 = 10" 2 Ao {E 3 = 4.6fe s T), 
X b = O.lAo, E = 30fcsT and a = 5.3fc s T. Note that here 
the average height of the barrier at the target site is 6.25fc_gT. 



example, through the median lZ(t typ ) = 1/2. For analyt- 
ical purposes, we find it useful to define it through 



e -t/t yp p(t)dt = P(l/t tvp ) = 1/2. (3) 



Experimentally, the relevant time, where almost all 
search processes end, is t typ and not t ave . 

Importantly, the distribution lZ(t) has two intrinsic 
time scales, one short and one long, and can in practice 
be well approximated by 



Tlit) ~ 1 - qe- t/Tl - (1 - q)e 



-t/r 2 



(4) 



where q, t\ and T2 can be calculated analytically. This 
form allows an explicit determination of t typ (through 
Eq. Q) and enables the following interpretation. The 

short time scale ti = — ^^j(A s = 0, s — 0) characterizes 
events where the protein never enters the r state and 
is therefore independent of the binding energy E r (and 
hence of A s ) ; q = P(X S = 0, s = 0) is the probability 
of such an event. The time scale T2 = (t ave — qr{)/(l — 
q) characterizes events where the protein enters the r 
state, and is therefore much larger than t\ in the case 
of strong binding (A s small). As illustrated in Fig. [2] 
the competition between the two time scales can lead, 
for DNA lengths which are experimentally relevant, to 
a significant difference between the typical and average 
times. More precisely, we find that for DNA lengths N < 
%/2A A u /A r , q is of the order of one and t typ ~ t\ ~ 

N^J ^(A" 1 + Af/ 1 ) is independent of A s - the only rate 

which depends on the binding energy in r mode. The 
relevant time scale of the search process t typ can therefore 
be much shorter than t ave ~ NX r /X s ^/2XoX u even in the 
presence of deep traps (A s small). 

This interesting regime where t typ <C t ave requires a 
rather large barrier between the s and r state in the case 



of long DNA molecules (namely, X r < ^/2XqX u /N). We 
now argue that this constraint can be, to a large extent, 
relaxed when n p proteins are searching for the target si- 
multaneously. In this case even when for a single protein 
t ave = t typ the typical search time t typ of n p proteins can 
be significantly shorter than t ave /n p even for relatively 
small values n p as 10—15. Here, again, t ave is the average 
search time of a single protein and t typ is denned as in 
Eq. [3] where for n p proteins the first-passage distribution 
P np (t) is deduced from the cumulative distribution 



n n (t) = i - (i - n{t)y 



(•5) 



In Fig. [3] we show the results of lZ n (t) for n p = 10. 
Note that as claimed above t^ p < t av */n p , whereas t typ 
is close to t ave for one protein. This can be understood in 
the following manner. Using the approximate form, Eq. 
IH in Eq. [5l it is obvious that when T2 3> n, the decay 
of TZ np (t) is dominated by t\ as long as (1 — q) np <C 1. 
In essence since only one protein needs to find the tar- 
get, the probability of a catastrophic event where the 
search time is of the order of Ti is p ca t = (1 — q) np which 
decays exponentially fast with n p . For large enough val- 
ues of n p the short time scale t\ controls the behavior 
of lZ n (t), even if it is insignificant for the one protein 
search time. This implies that searches involving several 
proteins strongly suppress the long time-scales induced 
by the traps which control t ave . The typical search time 
is then given by t typ — Ti/m, where m is of the order 
of n p , and is therefore again widely independent of the 
binding energy of the r mode. This makes fast searches 
possible even in the presence of deep traps - enabling 
both speed and stability. 

We now argue that this mechanism of fast search can 
still be at play when the binding energy of the protein 
to the DNA is strongly disordered, as observed in exper- 
iments. To account for this we consider the case where 
the barrier height is drawn from a Gaussian distribu- 
tion: p(El) = e-( E '»- Eo)2/2a2 /^2-Ka 2 . Importantly, in 
the presence of disorder we can propose an intrinsic def- 
inition of the target as the site with the lowest barrier 
with no specifically designed properties. Indeed, our pre- 
vious assumption Xj = oo at the target site and A^, small 
everywhere else is a rather strong demand. Since the 
target sequence is of the order of 10 base-pairs, many se- 
quences with similar properties are very likely to exist, 
unless the DNA sequence is carefully tailored. To ana- 
lyze this model we combine numerics with a mean-field 
analysis. For simplicity, we consider the extreme case 
where all recognition sites are infinitely long lived X s = 
(or equivalently t-i = oo), which obviously fulfills the sta- 
bility requirement. Note that the average search time is 
then infinite. 

Within the mean field approach we replace the dif- 
ferent quantities by their disorder average and account 
for the barrier at the target site. We first compute 
the disorder averaged probability of crossing the bar- 
rier at the target at each visit. Knowing the distribu- 
tion of the minimum of the barrier [24| . this is given 
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5: Results for the disordered model. Here N 
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FIG. 

A 3 = 10~ 2 A (E 3 = 4.6fc s T), X b = O.lAo and E = 30k B T. 
(a) peat as a function of a for n p = 1 and n p = 10. (b) t t!/p 
for n p = 10 and t\ are plotted as a function of a. Using 
Ao = 10 6 sec _1 [|| for n p = 10 at the minimal p ca t we find 
~ lOsec. 
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by pi - ds i+A„/A 0+e -^/*BT as 

Here we set the time scale of the activation process across 
the barrier to be Ao- We finally assume that the expres- 
sion for u(s) of the non-disordered model holds with A r 



replaced by A r = Ao /f 
placed by 

Jpi 



-E/k B Te_ 



5^2 — 



dE and j re- 



Pij(z) 



1 - (I - Pi)jo(z) 



(6) 



where jo(s) is the generating function of the first return 
time to site [22| . 

First, we show that the two scales scenario described 
above still holds. Indeed, Fig. [4] shows that TZ(t) is well 



fitted by Eq. [4] for realistic values of parameters. This 
implies that for n p large enough the only relevant time 
scale is t± and the typical search time again takes the 
form tf® p ~ Ti/m with m of the order of n p . This enables 
a fast search even in the presence of infinitely deep traps. 

The regime of a fast search with t^ p independent of 
the trap depth E r also requires, as above, a small p ca t- 
We now show that this condition holds in a wide range of 
disorder parameters. To illustrate this, the dependence 
(holding all other variables constant) of p ca t and tfj? p on 
a, obtain from numerics and the mean-field treatment, 
is shown in Fig. [5] for realistic values of parameters. No- 
tably, the value of p ca t can be minimized as a function 
of a. This reflects the fact that for small values of a the 
DNA sequence has to be scanned many times before the 
target is entered in the r-mode. Increasing a lowers the 
barrier at the target and therefore reduces the number 
of scans needed, which diminishes p ca t- For larger a the 
chance of falling into a trap increases due to lower sec- 
ondary minima of the barrier, which leads to an increase 
of Pcat- As expected, p ca t is dramatically decreased when 
n p is increased, even by a few units, and can remain small 
for a wide range of values of a. For larger er, p ca t increases 
and f$jf p rises quickly as it starts to depend on 75. 

Most important, as advertised above, these results 
show that it is possible to obtain relatively small values 
of t*y p and p ca t with realistic values of the parameters 
(see Fig. [5])- Reasonable search times (in the range of 
seconds) are obtained for a rather large range of a as long 
as n p is of the order of ten or more proteins, even in the 
extreme case of infinitely deep traps suggesting a pos- 
sible resolution of the speed and stability requirements. 
We note that by moderate changes in Eq similar results 
can be obtained for much longer DNA sequences. 
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